AITopics | different algorithm

Hardware-specific optimizations in machine learning (ML) frameworks can cause numerical deviations of inference results. Quite surprisingly, despite using a fixed trained model and fixed input data, inference results are not consistent across platforms, and sometimes not even deterministic on the same platform. We study the causes of these numerical deviations for convolutional neural networks (CNN) on realistic end-to-end inference pipelines and in isolated experiments. Results from 75 distinct platforms suggest that the main causes of deviations on CPUs are differences in SIMD use, and the selection of convolution algorithms at runtime on GPUs. We link the causes and propagation effects to properties of the ML model and evaluate potential mitigations. We make our research code publicly available.

artificial intelligence, deviation, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Tyrol > Innsbruck (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

072b030ba126b2f4b2374f342be9ed44-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 09:27:27 GMT

imaml, iteration, trade-off, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.31)

Add feedback

Training Uncertainty

Neural Information Processing SystemsFeb-10-2026, 16:17:20 GMT

The first subset (in red) is utilized to evaluate a traditional accuracy-basedlossfunction `a,suchasthecrossentropy. This benchmark is based on a loss function designed to incentivize the trained model to produce the smallest possible conformal prediction sets with the desired coverage (e.g., 90% ifα = 0.1). The hybrid training procedure is similar to Algorithm 1, in the sense that it relies on analogous soft-sorting, soft-ranking, and soft-indexing algorithms toevaluate adifferentiable approximation Wi oftheconformity scoreWi in(8). Above, the second equality follows directly from the fact thatS(x,U;π,t), defined in (A2), is by construction increasing in t, and therefore Y / S(x,U;π,1 α) if and only if min{t [0,1]:Y S(x,U;π,t)}>1 α. The proof consists of showing that`a and`u are separately minimized by ˆπ = π,although only approximately inthelatter case.

artificial intelligence, early stopping, machine learning, (10 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
North America > United States (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

A Extension to k-Means and (k, p)-Clustering

Neural Information Processing SystemsFeb-10-2026, 14:23:25 GMT

The lower bound on opt( U) given in Lemma B.10 holds for ρ -metric spaces with no modifications. By making the appropriate modifications to the proof of Theorem C.1, we can extend this theorem to In particular, we can obtain a proof of Theorem A.5 by taking the proof of Theorem C.1 and adding extra ρ factors whenever the triangle inequality is applied. We first prove Lemma B.1, which shows that the sizes of the sets U By Lemma B.2, we get that Henceforth, we fix some positive ξ and sufficiently large α such that Lemma B.3 holds. By now applying Lemma B.4 it follows that The following lemma is proven in [25]. Lemma B.1, the third inequality follows from Lemma B.7, and the fourth inequality follows from the The second inequality follows from Lemma B.8, the third inequality from averaging and the choice Proof of Lemma 3.3: It follows that with probability at least 1 e Hence, by Theorem D.1, we must have that O (poly( k)) query time must have Ω( k) amortized update time.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Add feedback

Fully Dynamic k-Clustering in O (k) Update Time

Neural Information Processing SystemsFeb-10-2026, 14:23:21 GMT

Clustering is a fundamental problem in unsupervised learning with several practical applications. In clustering, one is interested in partitioning elements into different groups (i.e.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

94cb02feb750f20bad8a85dfe7e18d11-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 09:43:30 GMT

algorithm, qsparse-local-sgd, synchronization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada (0.04)
Africa > Ghana > Greater Accra > Accra (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

SupplementaryMaterial

Neural Information Processing SystemsFeb-8-2026, 21:43:47 GMT

We provide additional results for EGTA applied to networked MARL system control for CPR management. Restraint percentages under different regeneration rates The heatmaps in Figure 7 (A-C) highlight the differences in restraint percentage for different values ofα as the regeneration rate is changed from high(0.1)to In the case where agents are completely self-interested (α = 0)shownin(A), themajority ofalgorithms without communication display verylowlevels of restraint for all rates of regeneration. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payofffor all agents. Schelling diagrams using a different parameterisation An alternative parameterisation for a Schelling diagram is to plot payoffs for a particular agent (cooperating or defecting) with respect to the number ofother cooperators on thex-axis, instead of thetotalnumber of cooperators.

artificial intelligence, schelling diagram, supplementarymaterial, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

Filters

Collaborating Authors

different algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

af076c3bdbf935b81d808e37c5ede463-Paper-Conference.pdf

a3017a8d202a433be56a3dfdcac6c8eb-Paper-Conference.pdf

b146e7c87685fa208bd95ce4b08e330c-Paper-Conference.pdf

Causes and Effects of Unanticipated Numerical Deviations in Neural Network Inference Frameworks

072b030ba126b2f4b2374f342be9ed44-AuthorFeedback.pdf

Training Uncertainty

A Extension to k-Means and (k, p)-Clustering

Fully Dynamic k-Clustering in O (k) Update Time

94cb02feb750f20bad8a85dfe7e18d11-Supplemental.pdf

SupplementaryMaterial